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Abstract The statistical mechanics of Gibbs is a juxtaposition of subjective, probabilistic ideas on the one 
hand and objective, mechanical ideas on the other. From the mechanics point of view, the term 'statistical 
mechanics' implies that to solve physical problems, we must first acknowledge a degree of uncertainty as to 
the experimental conditions. Turning this problem around, it also appears that the purely statistical arguments 
are incapable of yielding any physical insight unless some mechanical information is first assumed. In this 
paper, we follow the path set out by Jaynes[25 1, including elements added subsequently to that original work, 
to explore the consequences of the purely statistical point of view. Because of the amount of material on this 
subject, we have found that an ordered presentation, emphasizing the logical and mathematical foundations, 
removes ambiguities and difficulties associated with new applications. In particular, we show how standard 
methods in the equilibrium theory could have been derived simply from a description of the available problem 
information. In addition, our presentation leads to novel insights into questions associated with symmetry and 
non-equilibrium statistical mechanics. Two surprising consequences to be explored in further work are that 
(in)distinguishability factors are automatically predicted from the problem formulation and that a quantity 
related to the thermodynamic entropy production is found by considering information loss in non-equilibrium 
processes. Using the problem of ion channel thermodynamics as an example, we illustrate the idea of build- 
ing up complexity by successively adding information to create progressively more complex descriptions of 
a physical system. Our result is that such statistical mechanical descriptions can be used to create transparent, 
computable, experimentally-relevant models that may be informed by more detailed atomistic simulations. 
We also derive a theory for the kinetic behavior of this system, identifying the nonequilibrium 'process' free 
energy functional. The Gibbs relation for this functional is a fluctuation-dissipation theorem applicable arbi- 
trarily far from equilibrium, that captures the effect of non-local and time-dependent behavior from transient 
driving forces. Based on this work, it is clear that statistical mechanics is a general tool for constructing the 
relationships between constraints on system information. 
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1 Introduction 

If the foundation of thermodynamics is to be built on processes existing in the physical world, then the whole 
structure of the theory will be subject to constant revision as new physics is discovered. This, however, has 
not proven to be the case. Rather, as new mechanistic information is added, statistical mechanics persists in 
an identical form, with changes only in the meaning attached to system states and measurement outcomes. It 
follows that, as in the case of the geometry of Euclid, statistical mechanics does not describe objects actually 
existing in the physical world, but rather idealizations of them[ 55 ] . This distinction immediately explains why 
the structure of statistical mechanics has persisted throughout the developments of the last century. Because 
its basic axioms are conventions chosen to be logically consistent and in agreement with our intuition, the 
mathematical form of the theory operates as a device for carrying out extended logic. 

There have, to date, been many examples of using logical inference for framing statistical mechanical 
questions. Perhaps the most widely known is in the gradual shift in the conceptualization of an "ensemble." 
Early ideas, associated with the names of Maxwell, Boltzmann, and others, were based on physically realiz- 
able systems with many weakly interacting particles, i.e. gases. The theory simply stated that an examination 
of all the particles at a single instant revealed the statistical properties of the ensemble. Gibbs H171 adapted the 
concept to systems that may contain strong internal interactions, e.g. solids or condensed phases, by imagining 
the ensemble as an infinite number of physical replicas of the system. His subjective conceptualization can be 
seen in his definition of the laws of thermodynamics as expressing "the approximate and probable behavior 
of systems of a great number of particles, or, more precisely, . . . for such systems as they appear to beings 
who have not the fineness of perception to enable them to appreciate quantities of the order of magnitude of 
those which relate to single particles, and who cannot repeat their experiments often enough to obtain any but 
the most probable results." It was immediately clear that for developing a subjective, formal treatment of the 
probability distribution over phase and its consequences, 'hypotheses concerning the constitution of matter' 
would not be required except in working out special cases. 

The shift toward a subjective interpretation occurred only gradually because of the combination of Gibbs' 
modest personality! 48] an d a dispute between Gibbs and his contemporaries 0151 . who viewed the physical 
reason for the weak coupling between ensembles which brought about equilibrium as paramount. Even as 
Schrodinger 0621 presented a maximum entropy derivation of the canonical ensemble similar to the modern 
treatment of Jaynes[33 1, he still found it necessary to seek a middle-ground by considering such distractions 
as the physical realizability of infinite heat baths. The work of Jaynes [25 , 34 33] and others [ 1 8 1 went a great 
deal toward clarifying the situation by making a distinction between the "delusion that an ensemble describes 
an 'objectively real' physical situation" 11341 and the subjective question of determining the "agreement be- 
tween the premises and the conclusions." 0171 However, the philosophical debate over objective vs. subjective 
interpretations of thermodynamics continues to date [67 1 . Not surprisingly, attempts to prove ergodicity and 
convergence to maximum entropy distributions using mechanical arguments show that the most robust route 
is to introduce some form of uncertainty [71 46 1. 

Perhaps the strongest criticism of this approach is associated with the use of the term, 'subjective.' This 
term seems to imply that the results of the theory cannot be considered as objectively existing in reality. 
Nevertheless, experiments are able to compare work and heat values to find agreement with thermostatics, 
provided a given system behaves according to the assumptions. In exactly the same way, Euclid's geometry is 
able to deduce physically measurable distances, provided these objects behave as ideal solids. Subjectivity is 
present in both of these cases because assumptions are always required in order to calculate one quantity from 
another. The term 'subjective' simply acknowledges that this reasoning process proceeds from assumptions 
derived from experience. Physical predictions of objectively real phenomena can be made from a subjective 
theory based on assumptions that are objectively correct. However, imputing objectivity to assumptions used 
to solve a particular problem makes it impossible to conceive of possible changes in prior information and 
has given rise to some of the most difficult paradoxes in science. 

In this paper, we aim to derive the statistical aspect of thermodynamics from the logical foundation given 
by Jaynes [33 1 in sufficient detail to present applications to modern problems outside the realm of the equi- 
librium canonical ensemble. Although some of the most important results of this inquiry have already been 
presented by Gibbs, we find that a derivation from first principles clarifies the logical foundations of the theory. 
A similar derivation for the canonical ensemble from first principles [33] exemplifies the generality that such 
a theory may attain; however, several important questions remain unaddressed. First and foremost, the form 
of the canonical ensemble must change when new degrees of freedom are added. This addition corresponds 
to a change in the prior information for the problem, and it is not immediately evident how both problems can 
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be related. We have found that directly attacking this change in the number of possible 'states' of a system 
requires a type of logical relativity theory, which results from rejecting the two-valuedness of elementary 
hypotheses. We address this theory in Sec.[2]and provide the most important details in the Appendix. 

Next, we introduce the partition function and entropy functionals in Sec. [3] from which the elementary 
theory of statistical mechanics becomes manifest. A relative entropy functional is in most respects simpler 
than the free energy as it may be deduced from purely statistical considerations. The basis of this functional 
in information theory shows that thermodynamic states are characterized not by an objective physical situ- 
ation, but instead by subjective information about the system. Further, logically consistent predictions from 
statistical mechanics will disagree with the results of experiment whenever physically incorrect assumptions 
are made about the state of the system. A novel result of this section is that thermodynamic free energies 
fundamentally express the log-likelihood of a state of knowledge. Although the absolute probability is un- 
defined without specifying all possible states, the likelihood ratios between states may be deduced, and are 
exactly the exponentials of free energy differences. Although the partition function and entropy are defined 
as 'state-functions,' dependent on a state of knowledge, we justify the fundamental importance of likelihood 
ratios by confirming that the partition function can be built from a product of such likelihood ratios in any 
order, leading to the conception of thermodynamic cycles. This result also proves that the resulting probability 
distribution is a state function, as it should be. The assumptions required for constructing the relationships 
between informational states-the laws of statistical mechanics-are thus founded on probability theory. 

We will develop the ion channel as an example problem for applications (Sec. [4]) of the basic set of 
relations given in Sec. [3] Transmembrane proteins that to shuttle solutes between two aqueous/membrane 
interfaces have drawn the attention of a large crowd of experimental and theoretical investigators. J22] Selec- 
tive channels and transporters are critical for maintaining living cells in their nonequilibrium state. Similar 
functionality is a required ingredient of synthetic semi-permeable partitions, used in fuel cells, solute separa- 
tion, and electrochemical sensing. The operational characteristics of these devices are determined from their 
response to applied pressure, electric fields, and solute concentration differences. The most easily measured 
response is ion conduction, available through current measurements that can be carried out on micrometer- 
sized patches at milli-second resolution. [20 68 1 Conduction of other species, such as water, as well as struc- 
tural changes in the channel and surrounding interface regions are also important, but less accessible. The 
most easily accessible theoretical descriptions of channel behavior center around the structural properties 
of the equilibrium state and its propensity for ion occupancy under no external bias (in non-conducting 
conditions). [61 1 Instead of presenting a patchwork of accumulated techniques in statistical mechanics, in this 
article we present a top-down view by successively adding mechanistic information to predict these propensi- 
ties. This allows a construction of the simplest possible physical interpretation of channel behavoir, but uses a 
statistical mechanics capable of deriving all the complexities of atomistic and quantum-mechanical systems. 
Because no net currents are present at equilibrium 113 01 . the fluxes in these systems must be analyzed using a 
nonequilibrium theory. 

The usual Komologrov definition of probability and the Boltzmann factor are developed in Sec. 14.11 
and 14.21 The latter is derived by a maximum relative entropy argument along a path in a thermodynamic 
cycle. For the special case of maximum relative entropies, we find that the entropy increments add to the total 
information entropy of each state, proving path independence of the entropy for this case. These two cases 
generate the usual equilibrium thermodynamics approach without the necessity of assuming extensivity. We 
will show that this approach can be used directly to give a naive distribution over ion occupancy states for 
the channel. We will show later that this distribution is related to a coarse-grained (marginal) distribution at 
a state of knowledge with more degrees of freedom. However, both states of knowledge correctly employ the 
rules of statistical mechanics, and their difference lies in the assumed information for the problem. 

Adding new coordinates, the opposite of restricting them in Sec. 14.11 leads to the multicanonical ensemble, 
presented in detail in Sec. 14.31 The new degrees of freedom are termed coarse coordinates. The relationship 
between canonical and multi-canonical ensembles is the usual one. Fixing coarse coordinates within the mul- 
ticanonical ensemble generates a conditional ensemble. The aggregate probability of the coarse coordinates 
is related to the potential of mean force as in coarse-graining. |[38l Although we could directly add time- 
dependent states in this section, we have adopted a slower development, adding conformational states of 
the channel at equilibrium. This allows an intuitive connection to the well-known equilibrium theory, where 
introduction of interacting systems can change the distribution over the system of interest. The analogous 
development in Ref. 1125 II is the derivation of the constant pressure or constant angular momentum ensemble. 
Instead of describing everything in terms of all atomistic positions and momenta, or all atomic and electronic 
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eigenfunctions, we have shown here that these limits can be approached incrementally as necessary for each 
application. 

In general, adding maximum relative entropy information along with new coordinates, Y, will change 
the distribution over the previous coordinates, X. In some cases, for example when the marginal distribution 
over X is experimentally known, this is not desirable. We instead seek a method for inference on Y from a 
known distribution over X and maximum entropy information. The method for constrained addition is derived 
in Sec. 14.41 by assuming a probability distribution for ion occupancy states in a channel, and then inferring 
the distribution of channel conformations. Other important applications of the same theory are possible. In 
particular, it leads directly to the predictive statistical mechanics! 30] of dynamic processes arbitrarily far 
from thermodynamic equilibrium. The non-equilibrium process entropy (caliber) and free energy functionals 
follow naturally from a specification of non-equilibrium states as trajectories. The Gibbs relations for these 
functionals lead trivially to generalized fluctuation-dissipation theorems. Although based on ideas originally 
from Jaynes[27 ,28 1, the work presented in this section differs in an important respect. By fixing the distribu- 
tion at an initial time and then maximizing the step-wise transition entropies we arrive at a non-anticipating 
process. Previous presentations II27II471I141 utilize anticipating conditions, resulting in the possibility of non- 
physical influences from forces which may be exerted on the system at future times. Removing this shortcom- 
ing brings us closer to an original idea by Jaynes[26 1 and gives path probabilities immediately recognizable 
as a canonical form for forward transition processes. Information loss in discarding the starting distribution 
in favor of the final distribution leads to a quantity analogous to the thermodynamic entropy production. This 
entropy has a great advantage over other formulations[12.66 | in that it does not explicitly require definition 
of a 'steady-state.' Such a state may not be unique (as in the case for the Liouville equation) or even exist, e.g. 
transient processes like evaporation in an open system. It is our hope that this new development will complete 
the statistical foundations of thermodynamics by providing a basis for the second law of thermodynamics in 
information theory already hinted at in the problem of Maxwell's demon. PTfl 



2 Logical Foundations 

Jaynes[33 1 presents a cogent interpretation of probability theory as a method for conducting logical inference 
in the presence of uncertainty. This interpretation is based on Polya's qualitative conditions for plausible 
reasoning in mathematics [56] combined with the consistency theorems of Cox and Aczel[10 ,2| deduced by 
consideration of the associativity equation. Requiring our system for assigning plausibilities to be associative, 
such that adding information in any order leads to the same probability assignment, it is possible to deduce 
the product rule 

P(AB\C) =P{A\BC)P(B\C) =P{B\AC)P(A\C), (1) 

for which the right equality is Bayes' theorem. The symbols, A,B, and C stand for hypotheses, or logical 
propositions, and the symbols on the right of the | represent given information, or assumptions. In this paper, 
we denote propositions using Greek or capital letters. This distinction is necessary to allow for propositions 
that represent coordinates, i.e. 

X: Some property of the system is described by the number x. 

Propositions always appear inside the probability symbol and follow the Boolean algebra, where multi- 
plication denotes a logical 'and,' while addition represents a logical 'or.' We refer the reader to the first few 
paragraphs of the Appendix for the necessary notation. Of particular importance is the relation AB = A when 
A => B, used extensively to replace X with XS X . We also omit the prior information / for clarity in some 
instances, although it is always to be assumed in the formulas presented here. 

An immediate question occurs as to how probabilities may be assigned in the first place. The most appeal- 
ing answer is to employ the principle of indifference (termed /), which states that, for any number of possible 
outcomes, each is equally probable. However, this again begs the question as to the definition of the hypoth- 
esis space. Is any assignment possible in the absence of this knowledge? We assume that some assignment is 
possible, and state it as P (A\I) = constant for hypotheses, A, that are 'of the same type.' We provide a formal 
justification for this process in the Appendix, and note that it extends some amount of inference to statements 
that are undecidable, but does not affect the conditional assignments, P (C\AI), when A says something about 
C. 

The ability to reason in an un-defined hypothesis space has some interesting consequences for the principal 
of complementarity.pl | Suppose an infant is entertained by a screen that shows one color, xi,X2, or xt, and 
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plays one sound, y\,y2, or 373 at every moment. However, X\ never occurs simultaneously with yi and so on 
for X2 and Y2, X3 and J^. As these ideas are being learned, it appears that X\ and Y\ are mutually contradictory 
circumstances, and it should be possible for the infant to express some idea of the validity of that statement, 
X\ ©Y\. Upon their first encounter with both X\Y\ in the real world, the infant may be genuinely surprised. 
Moreover, if X\ represents the statement, 'color x\ is present,' and the child believes all colors are mutually 
exclusive (®(X\ ,Xi,Xi)), then X\ can be represented equally well with, 'any color other than x\ is present.' It 
may thus be ontologically true that bothXiZi for a complex scene. [ 197] Following the argument of Poincare 
for establishing real numbers[55 |, suppose the screen is divided into smaller and smaller segments, and each 
time the colors are found to be mutually exclusive in each segment. Then the act of defining, by recursion, a 
continuous space on which colors are mutually exclusive will lead to an idea in contradiction with the nature 
of light (but not with our measuring apparatus). The situation is immediately seen to be similar to the double- 
slit experiment, where we must abandon our notion that a particle at position one, X\ , and at position two, X2, 
are mutually exclusive, and instead replace it with X\ +X2 + (X\ @X2)C, where C denotes 'a measurement has 
been taken to determine x' These considerations relax the 'logical consistency' restrictions on what questions 
may be asked of the quantum theory. 0521 

The problem of assigning relative probabilities to hypotheses concerning which sets of events may be 
possible is considered at length in the Appendix. Using the principle of indifference, the main result is that 
the probability of a set of independent events, Q, is proportional to the number of events, \Q |, so that 

p(x\Qi) = , xen 

__ P{x\I) _ const. 
~ P(fl|7) ~~ P(Q\I) 
=*P(fl|/) = |fl|P(p|2), (2) 

where (p stands for some elementary hypothesis. This result elegantly sweeps questions due to symmetry un- 
der the rug, and these will be given more consideration in a subsequent paper. The discussion in the Appendix 
already shows that this development provides a method for dealing with symmetric hypotheses in a very 
simple format, fundamentally based on the principle of indifference. The appropriate '(in)distinguishability 
factors' are derived as a result of its use in Sec. 14.11 From a statistical mechanics viewpoint, the principle 
of indifference then provides partition functions, Z[£2] = P(Q\I) /P(<p|/), consistent with completely 'en- 
tropic' systems. We show next that conventional partition functions may be obtained from these by moving 
constraints on average values to the left-side of the probability symbol as well. 



3 Minimal Relations of Statistical Mechanics 

Because we are deriving purely statistical relationships, the only things we are able to compare are states of 
knowledge. There may be several convenient computational notations or methods for solving the resulting 
equations, but it will not be necessary to read these as implying physically existing quantities or mechanisms. 
It is not the physical, causal mechanisms themselves, but rather theories about them that are the subject of the 
reasoning process. These theories may appear as propositions to be mutually compared or as given information 
for solving certain inference problems. However, unless physics appears in this way, it can have no influence 
on the solution of the logical problem. Mechanics enters because the set of coordinates and constraints relevant 
to any given hypothesis must be found from mechanical insight, and the answers resulting from statistics will 
depend non-trivially on this input. 

We claim that the state functions of statistical mechanics, the partition function and entropy functional, 
can be derived by successive addition or replacement of problem information. The first process can always be 
carried out, with a corresponding change in the probability distribution for system states via re-weighting the 
probability distribution from the previous state. In general, this process is uni-directional. The second process, 
replacing information, can only be carried out directly via re-weighting in certain circumstances. 

To develop our notation, we represent each state of knowledge by a set, ^ = {A}, of propositions or 
informational constraints, A. Important types of propositions include system coordinates, energy assignments, 
and statistical weights. As we will see, the latter amount to propositions of the type "There is a physical 
mechanism increasing the likelihood of state x\ over X2 by some amount," and are closely aligned with energy 
assignments and the translation of problems with pre-specified coordinates to problems with pre-specified 
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physical forces. In addition to these, we also permit statements defining the set of coordinates relevant to 
deciding a given proposition-the problem phase space-as well as the symmetries these propositions obey. 
Although it may seem peculiar from a mechanistic point of view, any problem constraints can appear either 
as given information or as objects to be compared. We might also be able to remove the distinction between 
coordinates, energy assignments, and definitions of phase space to form so-called generalized ensemble or 
coarse-grained systems. 

We consider the process of adding information A to some known state, If, which we denote by 'if — > Aff. 
The first quantity of interest is the probability P (A|^7), which we compare to an alternative process, ^ — > 
<E> c i? '. It is convenient to assume the existence of a null hypothesis, <P, that is un-decidable from any other 
information. Formally, 

P(*|^/)=P(*|^7) =P(*|i) W,«". (3) 

Now divide the set of propositions, c f' (appearing above), into two sets, 3 and 'if . From the two equivalent 
ways of composing P (@4>\'ifl) using Bayes' theorem, it is easy to see that the above is true if and only if <P 
is irrelevant to conclusions about 3). 

P{9\^ c iT)=P{9\ c fT) 

We compute the relative likelihood, 

Z[Af\m = , (4) 

The summation set, {X}, should include any system states relevant to deciding the plausibility of 'if or A. 
To see this, assume that the states relevant to deciding c € or A are collected in the space E> x . Then write {X} — 
S x x §,-, where X t G S; are irrelevant to A and <g so that P {X\A^I) = P {X A ^Xi\AfI) = P (X A ^\AfI) P (X t -|7). 
The sum in Eq. [5]factors into 

v „ PjX^XjA^I) _ v P(A\X AV VI) 

If we are use information A, as an assumption it should come from known experimental data on the 
system. In order to establish A, we may therefore tabulate frequencies for X G § v . If A'if, turned out to be 
true, scientists basing their conclusions only on 'if would be increasingly surprised (or skeptical if the report 
is second-hand) at the evidence collected after N trials. This is because the probability of these results given 
ffS x would be (from the multinomial distribution), 

P({X}^A^\fS x ) = Nl [J P(Xi\^S x ) ni /m\ 



using 



Jt[A\B] = - £ P(Xi|AS,)ln 

x t es x 



P(Xi\AS x ) 



[P(Xi\BS x ) 



(6) 



According to f, the likelihood of such a set of observations decreases exponentially with N. This is a con- 
densed version of the Wallace derivation for the entropy, presented in more detail in Ref. 11331 . The limit taken 
in the second equation is as N — > °°, which is appropriate for assessing such a set of hypothetical observa- 
tions or second-hand reports. Evidently, the Kullback-Liebler divergence, — Jt? > 0, represents the value of 
the information A'f (or difference of opinion) to an observer who has already accepted ^S v . The relative 
information entropy, jtif, reaches its maximum, zero, when the new information does not alter the distribu- 
tion. For any reasonable comparison to be made, the distributions must be compared over the same set, S x , 
which should include any observational information that A or B may predict. As in the case for the free energy 
difference, above, the relative entropy is independent of the distribution over irrelevant variables, X,- G This 
happens here because the probability assignments are identical over the subspace X\Xj for each X;. 
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3.1 Transitivity 

The above arguments showed how to add information incrementally. If the starting point is taken to be /, it is 
then possible to assign a rank to all states, 

Z [ ? 1^S P ( X ^ (7) 

fcp(<p\xiy v \ 



where |^| denotes the size of the set, % ', and Z[<P] = 1. And we may compare any two states using 

P(jfjZ) _ zpfj 
P(3>\I) ~zM' 



(8) 



To show that this can be computed using successive addition of information and Eq. [4] it is necessary to 
prove that any order of information addition leads to (|7}. The proof is a direct consequence of Bayes' theorem 
CD- 

P(AB\I)=P(A\I)P(B\AI) 

= P(A\I) £ P(BX\AI) 

XeS x 

= P(A\I) £ P(B\XAI)P(X\AI) 

XeS x 

The last formula is exactly the form of Eqns.|4]and[5](with the normalization removed) and the above deriva- 
tion is symmetric in A and B. 

The relative entropy cannot be so defined, since it compares distributions. However, the entropy with 
respect to a complete space, 

•aWclsj, (9) 

can be compared among all states which depend only on the space § A .. In the sections below, it will be shown 
that adding maximum entropy-type information, B, makes J^[AS§ t |§ r ] = ^[ASS^A^] + [AS^|SJ. How- 
ever, this is not necessarily true when Eq. |2T]does not hold. 



3.2 Inference 

The above concepts may be solidified using the inference process as an example. Given a model, M, for 
how data may be generated, we may use any prior information or symmetries of the problem to write down 
a prior state of knowledge, SqM. The prior distribution over the parameter space, P (6\SgM), is then given 
by the free energy for the process SgM — > dSgM = 6M. Next, some data, D\, is collected and the state of 
knowledge updated to D\BiqM. The free energy for D\E>qM — > dD\M now gives the posterior distribution. 
Bayes' theorem appears as the thermodynamic cycle identity between the free energy for DiSgM — > 9D\M 
and D\SeM — > § e M — > QM — > QD\M 

Z\dD x M\ _ (Z\D x ^ e M\\r X Z[8M] Z\QD X M\ 
Z[Di§ e M} ~ V Z[S e M] ) Z[S e M] Z[6M] 

Interestingly, inference using Bayes' theorem is increasingly being used to estimate the probabilities of free 
energies from computational sampling experiments [64 54 58 1, and this can be generalized to estimating free 
energy functionals.[23 59) 

The relative entropy between SqM and Di§gM measures how informative D\ is in determining 6, while 
Jf? [DiZ>2§e |£h§e] shows the amount of information that D2 conveys once D\ is known. 
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3.3 Replacement of Information 

The second type of process is that of completely replacing information. For this case, we may apply all the 
formulas of the previous section, and compare ^ — > A^ with ^ — > B^. However, instead of computing each 
of these separately, we wish to directly compare the total likelihood between A and B. We thus fix either 
situation, and use 

This shows that the distribution over X\A can be transformed from X\B via re-weighting - although this 
is known to be computationally inefficient. [ 65 1 However, if there is an X for which P (X\B&) is zero, but for 

which P(A|X^) = ? ^p^|y) A 1S non-zero, then the above expression cannot be evaluated. Therefore, if B 
contains a restriction on the set of allowable X, then this restricts mutual comparison among A,B. Likelihood 
ratios can only be computed directly using Eq. [TOjif P (X\B C &) is nonzero on a smaller space Q C {X} than 
{X} on which P (ZjA^) is nonzero. To some extent, this caveat explains the computational problems involved 
with re-weighting samples. 11431 




Fig. 1 Reaction diagram showing system states as nodes. Two constraints, defining a coordinate space, and Q, defining some 
further restriction are illustrated here. F and G are average value constraints, and their relative likelihoods can be calculated using 
Eq.[l2]in either direction. For identical constraints, all hypotheses are completely connected, as shown by the double-headed, 
dark arrows. Restrictions such as S, or Q. limit the set of propositions that can be directly compared without knowledge of 
P(£2\M) /P(<J>|M), and only one comparison direction is allowed, illustrated by the grey, dotted arrows. 



Propositions defined inside Q can still be compared against one another, and their likelihoods computed 
from either the null hypothesis, <P, or a new null hypothesis, <P£2, defined relating only to X allowed by 
Q. Addition of the information, B = BQ, to a thermodynamic state can be represented using a commutation 
diagram (Fig.Q}, where paths represent step-wise addition of constraints / hypotheses. Completely commuting 
classes share an underlying definition of coordinate space. Whenever information of the type Q is added, 
it directly bears on subsequent propositions. Paths adding BQ will therefore restrict the set of subsequent 
questions that may be asked without knowledge of P(Q\ c ^ r ). These paths are therefore represented by a 
directed edge, branching from the above completely connected graph. The commutation diagram terminology 
is justified by noting that the multiplicative functions, dl lb . transforming one probability distribution into 
another arrive at the same distribution function for any 'allowed' path. 
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Because the free energy formula (|5) is simply Eq. [T0]for the special case B = <P, it is convenient to define 

so that free energy differences can be expressed more simply as 

P(A|<T) 



P(JJ|ff) 

As their name implies, these are weights, 

P(X\AI) 



wb^a\B). (12) 



w B ^ A (X)P(X\BI) 



LxeS x WB^A(X)P(X\BI) 



(m\A)= {WB r Af{ T - 



It must be understood that the re-weighting is only valid when wb->a < °°- 



4 Specific Applications 

The formulas derived in the last section are well-known relations in statistical mechanics. When Boltzmann 
factors are inserted for the weights wa->b(jc), Eq. [regenerates free energy perturbation and umbrella sampling 
formulas |9j and unambiguously identifies P(X|A). However, a few important differences from the standard 
development can be noticed in the above. First, the commutativity of thermodynamic cycles is perhaps not 
as widely appreciated as it should be. Although it is well known that is a state function, because of 
its definition in Eq. [7] this shows that a sum of relative free energy differences around any closed loop of a 
thermodynamic cycle totals to zero with the caveat that Eq. [T_2]may only be applied from a larger phase space 
to a smaller. The same is not true of relative entropies (|6), which give a sum dependent on the path taken. 
Instead, it is necessary to define J^f^S^lSJ as the state function. Also, the entropy definition of Eq. [6] is 
independent of changes in phase-space volume because P (X\AI) transforms the same way as P (X\BI) for an 
injective change of variables X — > Y. 

The physical problem of determining P (A\X'ra) jV(B\X c €') has not yet been addressed. Because this func- 
tion can be expressed as a ratio, we need only specify w&^AiXff). Extending the concept of the partition 
function (Eq.|7]), the weights can be interpreted as w&^AiX'tf) = Z[AX < tf}/Z[X'&]. We will present arguments 
for defining this function for several different types of problems, and find that the standard Boltzmann-factor 
form, e~ft>^A( r ) 5 ls no t a universal answer. The general idea will be to find a minimal set of relevant infor- 
mation XY, implied by X¥? so that A is conditionally independent from ^ when XY is known, simplifying 
the weight to w&^a (Xff) — Wa{XY). Comparing wa for different XY then suggests an appropriate relative 
weight. Specific problems relating to changes in the symmetries of phase space will be addressed in a separate 
paper. 



4. 1 Constraints on Phase Space 

A simple type of constraint is one that limits hypothesis space. 

Q. : The set of allowed states is limited to those in which ^ is a member of the set, Q . 

This type of constraint can be used to limit investigations to interesting, or highly probable configurations as 
well as formulate decision problems. Adding £2 to a state results in a normalization 

P(SP|flA) -E W P(*|A)/(«'efl)' (14) 

where the indicator function, /(•) is one when the condition is satisfied, and zero otherwise. 
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Given some % \ two constraints that both allow ^ should be equally likely, leading to the assignment 

w ^ )= wm =I{c * eQl (15) 

for any A that does not specifically reference Q. 

The free energy of the constrained system is the denominator of Eq. Q4] 

Z[AQ]/Z[A]= £p(^|A)/(^eX2). (16) 

m 

This is consistent with the assignment in the appendix derived for the case A = (p (Eq.[2]l. 
Given two constraints, we can use Bayes' theorem to show 

pro 10 fi) - p ( FQ ^\i) - WMpfo 10 n nn 

p(X2l|X22F/) - p( F n 2 \i) -m^T { 1 1 2 } ' ( } 

a simple theorem relating free energies in successively constrained spaces. If F gives no information deciding 
whether ^ satisfies both Q.\£l 2 vs. only 122, then the constrained likelihoods, V(F\£1I), should be equal. 
Because the principle of indifference gives P C?f |I27) = Jjt, and we have shown that P (Q \I) = const. X \£2 1, 
it is possible to not only compare energetic hypotheses, such as F vs. G, but also constraints on phase space. 
Eq. [l7]also shows that once information of this type (Q) has been moved to the right-hand side, then we 
will not be able to eliminate it using Eq. [10] Rather, once Q has been assumed, then subsequent addition of 
information will have to include Q as part of ^ on the right-hand side of Eq. [5] To remove this information 
and get P (F\I) would require P (Q. \FI). 

As an example, we consider the multi-ion binding site at a K + -ion channel selectivity filter (Fig. |2]l[ 1|. 
Four cationic binding sites are distinguished, and it is assumed that the channel presents a high enough ener- 
getic penalty to exclude the possibility of anion occupancy. We do not expect multiple ion occupancy of the 
same site to be possible (or highly probable) because of mutual electrostatic repulsion and geometric features 
of the channel. This leads us to the fermion-like default statistics, 

S x = ®[N ,N! ■ ®(X U X 2 ,X 3 ,X 4 ),N 2 ■ ®(X 1 X2,X l X 3 ,X 1 X 4 ,X 2 X 3 ,X 2 X 4 ,X i X 4 ), 
N 3 ■ ®(X 2 X3X4,XiX3X4,XiX 2 X4 1 XiX 2 X 3 ), N4X\X 2 X 3 X4] 

where n particles may occupy k states in ( ) ways for a total of 2 k elementary states of the system. In the 
absence of any other information, each state is equally likely. 

This probability distribution factors into a product of independent distributions for each site, with equal prob- 
ability for occupied and unoccupied states. The distribution is shown for reference in Fig. [3^. 

The partition function is the number of states, Z[E> X ] = 2 4 dl6l) . Using the same equation, the partition 
function of a constrained system, for example at fixed A^, is Z[NS X ] = Z[S X ] Y,x\N P {NX \S X I) = (*) . The much 
debated 'degeneracy factor' for particle counting has already crept in as a consequence of the definition 
d 1 8b . since in the limit K >> N, () — > k n jn\. In the following discussion we will successively incorporate 
mechanical information including the average system energy, and mutual interactions between the ions and 
the channel. 



4.2 Addition of Maximum Entropy-Type Information 

The significance of — InZ as the (non-dimensional) Gibbs free energy should be immediately recognized. To 
show this formally, define a hypothesis, F, as 

F: The probability distribution of the system, given that F is accepted, is the most likely observational 
distribution that obeys (f(x) \FA) = F for any A. 
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Fig. 2 KcsA ion channel selectivity filter in its biological orientation (intracellular solution below) showing ion binding sites 
S1-S4. For visual clarity, two of the four identical monomer units are not shown. Physiological conventions for the potential 
difference, AV, and direction of outward positive current (g) are indicated. 



Rather than specifying an absolute state, this information is phrased in terms of the change in the probability 
distribution from an initial state before F has been accepted. Representing this prior state of knowledge by 
A, then according to the argument above, the least surprising distribution given information (f(x) \FA) is the 
maximum entropy distribution. This distribution should satisfy the mathematical condition, 

P(X\FA) = argmax Jf[FA\A] s.t. (f(x)\FA) = F. (20) 

The unique solution to this condition is[ 



P(X\A)pL e -W 
P(X\FA) = / FA , (21) 

for some A (A), proving that the hypothesis F (Eq. |2Qj is logically equivalent to assuming the probability 
assignment of Eq.|2T| The Jacobian has been explicitly shown in this equation because of the importance 
of continuous functions in thermodynamics. In a discrete setting, it has the effect of dividing P(X|A) to 
maintain its normalization. At this solution, the value of is 

^max [FA \A] = KF + \n fp (X |A) |^ A/W dx FA (22) 

According to Bayes' theorem, 

, , . P(X\FA)dx FA P(F\A) P(F|A)e- A /W 

P (F \XA) = v 1 / , ; A v 1 ' = (23) 

1 1 ' P(X\A)dx A JP(X\A)^e-W*)dx FA 

To find the probability of F from a given X, we consider two cases. First, assume X (and /) constitute the 
only data relevant to deciding the plausibility of F. Then P(F\XAI) = P(F\XI) and the terms involving A 
must evaluate to a constant in the above, so that 

P (F \XAI) = const(Z) e~ Xf W case 1 . (24) 
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According to the principle of indifference, the leading constant must not depend on F, and thus is present to 
remind us that we are only able to compute likelihood ratios. We thus have the Boltzmann weight 

In the second case, we may split A into two pieces of information, B, determining some weighting over 
a set of hypotheses of which F is a member, and other information, A'\B, irrelevant to F when X is known. 
Obviously, maximum-relative entropy hypotheses fall into A', since they are making statements about X and 
not other hypotheses. Therefore, 

P (F \XA'BI) = const(F; B) e~ Xf ^ case 2. (26) 

The information, B, thus functions as a nuisance parameter[33| because different assumptions lead to dif- 
ferent assignments of plausibilities among F among some class that B affects, and we have wp(XA) = 
L(F;A)e~^f( x \ Because this type of information leads naturally to consideration of alternate classes of hy- 
potheses, we recognize this dividing information to be associated with a set, Q, of hypothesis space. If B 
re- weights relative likelihoods among alternate F £ Q, then F has effectively become a coordinate and B an 
energy-type constraint. If B re-weights all F £ Q. by the same amount, then its effect is to shift P(i2). We 
therefore arrive at the diagram picture of Fig. [T] Relative likelihoods between nodes can be computed via 
Eq.[lO](case 1) or[5](case 2). Subgraphs of this structure represent thermodynamic cycles. 

Sequentially using the maximum-relative entropy hypothesis, F, requires special consideration of the 
order in which information is added. For this type of constraint, the probability distribution is found to be 
independent of the order of information addition. This can be verified by recursion, writing the result of 
applying Eq.|2TJtwice. Surprisingly, the relative entropies add to the state function Eq.|9] Starting from B x and 
moving to FS X gives 

J?[FS X \S X ] = £ P(X\FS x )ln P( *'y 

Adding FB X ->■ FGS X gives 

Jf[FGS x \FS x ] = £ P(X\FG$ x )]n P(Z|FS - c) 



P(X\FG§ X ) 

P(X\FS X ) 



■■Jt?[FG§ x \S x ]+ £ P(X|FGS^)ln 

XeB x 

■ J^[FGS x \S x ]-lF-\n 



P(F\S X ) 



"P(*|Sx) 

= jf[FGS x \s x ] - ^wre,|§,]. 

Therefore, when F is a maximum-relative entropy hypotheses, 

<%?max[FGA\A] = 3^[FGA\FA\ + 3^\FA\A\ . (27) 

Jaynes[33 | has used the functional Eq. [6] and Eq. [7] to derive a host of general relations for maximum 
entropy constraints including the computation of averages, 

mt r> = _£!«!, 

and the (Legendre transform of the) first law of thermodynamics 

d(-lnZ[{Fj}S x }) = £ (fj(x) \V) dkj - fj^N, (28) 
from which the Gibbs relations, 



« - mum - if Am 
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may be found. We find that it is more appropriate to phrase these relationships in terms of Legendre transforms 
of the entropy functional, & = (fj(x)) — for the specific problems considered in Sec. 14.41 This 
distinction was unnecessary before because J? = — InZ for distributions derived strictly from constraints of 
the maximum-entropy form. 

A central maximum entropy constraint in statistical mechanics is a constraint on average energy. We label 
this constraint by j3 . A simplified energy function is constructed for the ion channel system by including a 
mutual Coulomb repulsion between the ions, constrained to the vertical axis and spaced at 3. 5 A. We also 
assume a simple stabilization energy for each ion from the protein, E° sa — 1 15 kcal/mol. Abbreviating NX to 
X, the energy function is 

E(X)=E(n,x) = \Z . f v v r /&Xj)+£E°Z&). (30) 



Placing this constraint on the average system energy at constant N leads to the well-known canonical distri- 
bution with partition function 



Z[NPS X 



P(NPS X \I) 
P(<5|/)P(<p|/) 

= Z[NS x ]^{X\NE x )w p (NX) = 



-PE(X) 



Here, it can be seen that the probability for NS X , ( )P(<p|J) (1511 . cancels in the expression so that the in- 
crement Z[NPS X ]/Z[NS X ] is an average according to Eq. |5j Removing the constraint on N also leads to the 
multicanonical ensemble in the same way, viz. Z[PS X ] = E^Z^jSSJ (Eq.[l6), P (N\PS X ) = Z[Nfi§ x ]/Z\P$ x ]. 

In either case, we can assign the parameter /3 the meaning of, "there exists a physical mechanism that 
decreases the likelihood of the system being in a high-energy state." To separate these energy states, we 
introduce a constraint on the energy, denoted by E. Thus, if a system were allowed to choose its own energy 
stat43, the force would bias this choice according to P(£j8|A) /P(E<P\A) = e~$ E . We can set this bias, j3, 
to give a reference system with known properties by exactly balancing its internal tendency toward higher 
energy, P (E + dE\A) /P [E \A) e~^ dE = 1. This implies that j3 should solve j3 = j^lnZ[EA] for a reference 
system with known energy, for example a thermometer in which energy is easily measured by size expansion. 
Because our reference thermometer is constantly exchanging energy with the environment, we usually observe 
its average energy, and j3 should be chosen such that (E\j5A) = — jplnZ[f5A]. The difference between these 
values (maximum vs. average energy) is important for small systems, but becomes negligible in the limit of 
large system sizes. ['81 Using either of these forces in the present system mimics the effect of allowing energy 
exchange between the thermometer at this state and the system. This explains the convention of identifying 
temperature with the dilation of a thermometer and its connection to the statical force, /3. 

Another constraint we may add is the inclusion of an external force on the total number of ions, pL. 
Because the n ions are more likely to choose an environment with lower energy, — fin, this changes the 

probability of ion occupancy by pr^ro = e^^ n . The multiplier /3 appears because we want to express jj. in 
energy units. Just as above, we can choose the chemical potential, jx, to give a reference system with known 
properties by balancing its internal energy change on ion addition using the choice (fifi) = -ln Z '|+]j A ' .0 

We can mimic the effect of allowing K + transfer from a bulk 100 mM KC1 solution to the present system 
(with the corresponding Cl~ moved to a similar environment and its contribution neglected) by choosing 
jU K + = —81 + jS — 1 InO. 1 kcal/mol. [16| Without the constraint on N, the system was effectively allowed to 
exchange particles with vacuum. The combination of both constraints, which we refer to as F = j3/i, is shown 
in panel (c) of Fig.[3j The preference for the separated state (X1X4) in this model shows the effect of mutual 
ion repulsion. 



Alternatively, to avoid anthropomorphic terminology, if the system energy is not constrained and we compare the maximum 
entropy P (E\A). 
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Fig. 3 Ion occupancy distribution in successively complex models. 
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4.3 Addition of Variables (Generalized Ensemble Methods) 

The theoretical background in Sec. [3] allows us to go further than the most common relations of thermody- 
namics summarized in the last two subsections. In particular, the choice of coordinate space, § A ., is no different 
than any other constraint except that it is almost never moved to the left-hand side to form quantities such 
as P(FS V |7) and comparisons between states are carried out almost exclusively with a fixed S x . The addition 
of coordinates is associated with the transition from canonical to multicanonical ensembles. It has served as 
the starting point for some very difficult reading in thermodynamics textbooks involving over/under counting 
and (in)distinguishability arguments. Our definition of P (Q \I) © counts each 'state of knowledge' once, and 
thus directly accounts for (in)distinguishability factors. As will be shown in a subsequent paper, this result 
does not require input from quantum mechanics other than a specification of the allowed states of the system. 

Since the rules have already been given above, we proceed to an example, addition of protein-ion in- 
teractions by assuming a set of protein conformational states. This leads to the conception of a generalized 
ensemble. A simplistic example is provided by assuming (in addition to an open state, O) two 'C-type' inac- 
tivated states in which a pinching motion of the pore prevents occupancy at site 2 (state I\ ) or sites 2 and 3 
(state /?~) II13I . These states are assumed to be mutually exclusive and exhaustive, so that all conformational 
states, Y, are a member of the space Q = ®(0,I\,l2)- Before any coupling is assumed, the total number of 
occupancy states, |§ v |, is multiplied \Q\ times to create the product space, Q x When the conformational 
state is known, Q is irrelevant, and we can intuitively use the knowledge of its coupling to X (denoted by G) 
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to guess a form for P (X\YFGS X ). This is an instance where intuition runs ahead of logical reasoning, and it 
is difficult to see the logical steps required to arrive at this result. 

Because X is coupled to Y , the conformation, Y is also coupled to the occupancy state, X, and it is neces- 
sary to know the full distribution, P (XY\FGS X Q). This can be rationally arrived at using our thermodynamic 
diagram (Fig.QJ. We could first add a non-interacting space for Q. to get P(XY\FS X Q) = P (X\FS X )P (Y\Q) 
(FS X — > FS X Q) and then add information on their coupling, FE X £2 — > FGE> X £2. The distribution of X changes 
when GQ is known, since G places constraints on both X and Y. 

„ , P(XG\YFS X ) P(XG\YGS X ) 

P(X\YFG§x) = V-r » = — i o x (31) 

V 1 x> P(G|7F§,) Lx eSx P(XG\YGS x ) 

Is there more to learn from this result? In Sec. [3] we showed that any order of adding the information 
leads to equivalent results, as long as free energy differences are computed in the direction of increasing 
constraints. Given information FGQS X (or FGYS X ), we are able to write down the distribution for XY simply 
by maximizing the entropy J^[FGQS X \QS X ] (or [FGYS X \S X ]). These generate conditional distributions 
given information of the type: 'the system is in a given coarse state.' The unanswered question is what the 
distribution over the coarse states looks like. To answer this, we consider the process FS X — > FGS X Q. The 
mutually exclusive and exhaustive condition, Q, defines a space for the coarse coordinates, Y. However, 
without this space, we may still calculate FS X — > YFGS X , 

Z[FGYS X ] P{FGYS X \I) 



Z[FS X ] P(&\I)P(FS X \I) 

_ P(GY\FSJ) _ y P(GY\FXI) 

P ( *|i) - 4 "W p {x lFSx) • 

This could also have been arrived at through the intermediary path FS X — > YFS X — > YFGS X . The probability 
for Y in some mutually exclusive and exhaustive set is a sum of these 

Z[FGn$ x ] P(FGQS X \F) 



Z[FS X ] P(*|J)P(FSy/) 

^ P(GY\FSJ) _ y Z[FGYS X ] 
~ Yen "W" ~ Y f n ~WW 

We find again that the partition function of Eq.|7]has a direct probability interpretation as an un-normalized 
probability. 

This idea forms the basis for understanding the free energy difference as a log-likelihood ratio between 
two Hamiltonians as expressed by Eq.[8]and for extending a canonical ensemble into a multi-canonical one. 
To perform the extension, define some space over which a previously fixed parameter may vary, and then 
integrate the partition function over this space. Given a set of mutually exclusive and exhaustive coarse states, 
we may write down the micro/multi split using 

P(XY\FGS X £2)=P(X\YFGS X )P(Y\FGS X £2) (32) 
and the coarse probabilities using either of 

, , s Lxes P(XYFG\E X Q) 

_ Z[7FG § J 



Z Yen Z[YFGS x y 



The denominators of the second and third expressions correspond to the free energies for processes S x £2 — > 
FGS X £2, and <P — > FGS X Q, respectively. This argument holds when Y denotes any type of constraint, and the 
generalized ensemble method is an example of the above when Y are alternate Hamiltonians. 112 1 11451 

As an aside, the interpretation of Eq. |6]given in the introduction implies that the relative entropy addition 
S x E x £2 (as well as FE> X — > FS X Q) is zero. This is a reasonable result in the following sense. If some 
distribution over X G § r is assumed, and new observations of a coordinate, Y, became available that were 
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nevertheless completely random, then FS X Q does not have any additional informational value, relative to FE X . 
This is contrary to the behavior of the thermodynamic entropy because the thermodynamic entropy increases 
whenever states are added to the system, even if they are irrelevant, leading to nonzero entropy for nuclear 
spin systems at zero Kelvin. Instead of this behavior, it seems preferable to define the entropy relative to the 
completely uniform distribution, as we have done here. In this case, the probability for occupying degenerate 
(but distinguishable) states increases because of the counting conventions of the free energy functional. 

Incorporating the conformational state information, GQ , into the ion channel system leads to the results 
shown in panels (b, no energetic constraint) and (d, constrained chemical potential and energy) of Fig. [3] 
Because fewer states are available to the system in conformations 7) and I2, they appear less often. Colloqui- 
ally, they are said to be entropically un-favorable. In our derivation, this entropy decrease came about from 
adding information G. This result that could have been derived either as a consequence of formally reducing 
the number of occupancy states (as we have done) or by assuming a very large energy for un-allowed occu- 
pancies at /1 and ^2- The statement, '72 is entropically unfavorable' is therefore expressing the fact that the 
accessible volume for X has decreased from some previously available volume upon changing Q to I2 or upon 
adding information I2G. The conventional thermodynamic entropy implicitly defines this previously available 
volume, regardless of whether such a state physically exists. This dependence is made explicit in the present 
definition of a relative entropy. 



4.4 Conditional Maximum-Entropy Information 

If, instead of the energy function assumed for F in the above example, we had assumed some experimentally 
known probability distribution over X, then adding information G becomes qualitatively different. In order 
to not interfere with the distribution over X, the information F must take priority over any other constraints 
we may add to the problem. However, this does not prevent us from coupling Y to X using the conventional 
maximum-relative entropy hypothesis, 

G: The probability of XY , given that G is accepted, is the most likely observational distribution that 
obeys (g(y;x) \AXG) = G(X) for any AX. 

This is because the entropy functional decomposes as 



The sums in this section are all taken to be over X G § A and Y G X2 without loss of generality since we choose 
Sjc x Q to be the set of all XY relevant to deciding A or G. The last term in the expansion above is a conditional 
entropy, which is a functional of P (Y \AGXQ.) and depends on X. Because each conditional distribution can 
be chosen independently from the others and from P (X\AGS X Q), the entropy of each one is independently 
maximized when ,Jtff{AGS x Q \AE X Q] is maximum. However, the presence of Y allows Mx [AGS X Q \AS X ] to 
differ from Jfy [AS X \AS X ] = 0, since P (X \AGS X Q ) = £ r P {XY \AGS X Q ) . For these two to be equal in general 
requires that P (X\AGS X Q) = P(X\AS X ) - i.e. that the distribution of X is not dependent on the information 
GQ when A is present. 

Because we want to specify the marginal distribution of X directly, it is convenient to denote this infor- 
mation as the compound hypothesis, 

Fx: The probability distribution of X is determined by information Fx and unchanged by information 



Jf[AG§ x n\AS x n] = Y,P(XY\AGS x Q)ln 




Mx [AGS X Q \AS X ] +Y,P(X \AGS X Q ) .W Y [AGXQ \AXQ]. 



(35) 



x 



GQ. 



When this hypothesis is in place, we will have P(X\FxGS x Q) = P(X\FxE x ). Bayes' theorem says that we 
must also have P (GQ \XFxS x ) = P(GQ\FxS x ), implying wqq (FxX) = 1. Effectively, the Y have become 
'imaginary states' to the system in the sense that there is no free energy change for Fx§ x — > FxG§ x Q. 



17 



Although there is no change to Mx or the distribution of X, maximizing (135b results in 

LYea?(Y\FxXn)e-^W 
an expression reminiscent to the transition probability for a Markov process. The conditional entropy is 

^[^|^]=£P(r|^)in^M_ 



(Xg(y;x)\F x GXQ)+ln £ V{Y\F x Xa) e - x ^ x \ 



Yen 

and we define as usual 

MXY) = HG }T Y ! ] =e- X8M - 
° K ' P(<P\XYI) 

These considerations are sufficient to fill out the thermodynamic cycle when Fx is assumed, as has been done 
in the left half of Fig. 01 



Ex^fAX) Z[F Z S X ] Ez^Fz(Z) 

1 Z[F X B X ] 

F X S X ++F X GS X Q — F Z G*S X Q**F Z S. 



X 



P(X\F X S X ) 



?(Z\F Z S X 



y y y y 

F x X^F x GXn F Z G*ZVL ^F z Z 



w Fx (X) 



y 



V{Y\F X GX) j P(Y\F Z G*Z) 



w Fz (Z) 



F X GXY F Z G*ZY 

wp x (X)w G (Y;X) w Fz {Z)w G *{Y;Z) 

Fig. 4 Reaction diagram for adding conditional maximum entropy information. Partition functions, determined by likelihood 
ratios for each transition, are written out for each state. For the 'forward' process Fx§ x — > FxG§ x C2, there is a 'reverse' process 
Fz§ x — > FzG*§ x Q signifying the dual maximum conditional entropy problem. 

Imposing the distribution among ion occupancy states given in Ref. HI (shown for reference in Fig. |3f) 
as Fx, application of this procedure to determine the conformational equilibrium shows that the channel is 
almost always in the open state due to the high probability for occupancy of S2. The probabilities for I\ and 
I2 are 2.3-10 -4 and 8T0 -6 . Although X\Fx is independent from GQ, knowledge of Y is still informative for 
X, as 

p (x\yf \ - gffigg 

P(X\YF X GS X ) Lx , p{xllFx§x)p{YlFxGrn y (37) 

Using this method of inference, the occupancy distribution in the open state is shown in Fig. [3^. There is a 
very slight increase in occupancy at S2 and a decrease at S3, but the effect is small because the open structure 
is dominant. Note that our assumption that the free energies of Ref. (TJ are averages over the conformational 
states was chosen for illustration and may be incorrect. 

We argue that addition of this type of conditional information is central to non-equilibrium statistical me- 
chanics. To derive an ensemble of trajectories, we add all possible transitions, Y, originating from each state, 
X. The initial state and its transitions are linked by some information, G, which determines the distribution 
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of Y given X. This constraint determines a maximum entropy transition probability density, as considered in 
differential form in Refs. Il6l l40l . The hypothesis Fx states that what we know about the starting distribution 
is completely determined by Fx and not by any possible, but unknown, future events. It is required for the 
process to be non-anticipating in the sense that no information about processes we may carry out in the future 
- GQ is available from X. 

There is a great deal of literature on methods for non-equilibrium statistical mechanics. Because this paper 
is intended to show a new way of approaching problems, we will confine ourselves to deriving two main results 
of the non-equilibrium theory. The first is the more recent development of fluctuation formulas for irreversible 
entropy. Fig.|4]displays the duality between fixing Fx at the initial time and fixing its propagated distribution 
Fz- In setting up an inference problem for Y starting from FxGX, the distribution of Y is given by ( 136b . If 
this distribution is used to determine F z using P(Z\F X GE X ) = £ XF P (Y\F X GXQ)P (X\F X S X )I(Z = Z(Y)), 
some information loss occurs when Fx is discarded and only Fz and information constraining the transitions 
between states, G, retained. Assuming the transitions, Y, specify both end-points X,Z, the distribution of Y 
carries the complete information for this process. Using the information loss metricll3l l331 . 

L = -JF[F x G§ x n \F Z G*S X Q] 

= YP(Y\FxGS x n)ln P ( Y l F x G§ f\ 
y P(Y\F Z G*S X Q) 

/ ln nY\FxGXn) P { X\FxS) 
\ P(Y\F Z G*ZQ) P(Z\F Z S X ) 

= ^ Z [F Z S X \S X ] jek[F x S x \S x ] + (in p^g^) ) • (38) 

The averaging is taken in the forward direction, and so L > evidently represents the amount by which the 
real distribution F X G — > XYFxG contains information not present in a distribution guessed from FzG*. Note 
that if G allows only one-to-one XZ, the transitions are deterministic, and zero information is lost. More 
generally, if forward and backward inference directions yield the same joint distribution so that FxG = FzG*, 
then there is no way to discern the direction of time's arrow and no information is dissipated. 

The above relations are purely statistical, and have been stated in terms of maximum entropy constraints 
for forward, G, and reverse, G*, inference problems. They are generally valid for any choice of G*. In deriva- 
tions of the fluctuation theorem, [37] a particular choice of G* is made corresponding to time -reversed equa- 
tions of motion. The statistical perspective expressed here shows that this operation is confined to the choice 
for G* , and provides a suggestion as to the informational role of time-reversal. For example, the forward 
constraints are consistent with the Langevin equation, 
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so that the momentum change (Ap = pz~ Px) is normally distributed about F — yv to yield a Boltzmann 
distribution. The correct choice of G* is given by changing /3 to — j3 in the above equation. The equation for 
Brownian motion can be similarly derived by constraining Ax 2 with a~ 2 /2 and —AxF/2 with /3. In both of 
these equations, the same set of forward transitions are used for G*, but the sign of the Lagrange multipliers 
constraining the fluxes are reversed. We can thus intuitively see that reversing the sign of externally applied 
forces gives the correct fluctuation theorems using the information loss metric (Eq.l38b. This relation is valid 
in transient stochastic dynamics, and allows for entropy to increase both by increasing the entropy of the 
distribution (first part of Eq. [38) and by the presence of irreversible fluxes (last term of Eq. 138) . Such an 
informational perspective is required for understanding entropy increase for processes which to not have 
time-reversal symmetry, but nevertheless have well-defined and reproducible behavior. 

Retaining only information about the end-points of a path F = X{X2 ■ ■ - X^, from F\ to F^, we denote 
F = X\ ■ ■ -Xi and F' = X, ■■ ■ -X^. We also assume constant E x and conditional independence, P (Xj + i |GF^i ) = 
P (Xi + \ |GF/). If the transitions are known from F, the total dissipation is 

ds/k B =L=j%[F&\s x ]-je 1 [F l s x \s x ] + Ct^ p^G*r^s x ) ) ' (39) 



where kg is the Boltzmann constant. This path functional is in agreement with the thermodynamic entropy pro- 
duction given by the ratios of forward and reverse path probabilities ]! 1112411371 as well as an expression for 
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entropy production deduced from mechanical considerations! 6 1 when lnP(X/ + i |GJ7§ V ) /P (X/|GF ,+1 S r ) = 
— Xg(xi + \ ,xi), withg a generalized flux. We have derived this result from the direction of information propagation, 
and no special treatment has been given to the multiplier, j5, defining the externally applied temperature. This 
derivation also avoids the complications associated with defining a steady-state. A curious feature is that it 
does not make specific reference to heat. This may be explained by noting that the transitions associated to 
fluxes, g, are probabilistic and represent interaction with an external system. These transitions may add or 
remove energy from our system, while the external system remains at a fixed thermostatic temperature state, 
Pext - We then define the heat injected from the environment as the net energy gain, fiextdQ = (Xg{xi + \\Xi)) . 
This identifies ( 1391) with the Clausius form for the second law, 1 50 2 91I32B 

dS/k B = dS int /k B - p ext dQ > 0. (40) 

The above claims relating transition probabilities to fluxes can be established for the Langevin and Brownian 
equations, and have been more thoroughly explored in a manuscript devoted to nonequilibrium problems [ 60 1. 

The next result will be a derivation of the fluctuation-dissipation theorems from the Gibbs relations. Be- 
cause our free energy for the process A = Fx\SxiG\2§x2Gn?, ... is simply the free energy for FxiSxi, we 
must find an alternate free energy functional. The 'caliber' function of Jaynes, 

Jf[AS r \§ r } = ^P(r|AS r )ln ^jM (41) 
lends itself to the task by defining the Legendre transform 

N 
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The first derivatives generate a 'first law' for non-equilibrium processes, 

d & 

?3f -<"("» 

^=£(«,(I7))<JA,-. (43) 

i 

This is a path average conditional on ASr, but this notation has been suppressed for clarity. The second 
derivatives are the Green-Kubo formulae 

(K> =-(S gi (Tl)8 g j(Tj)) (44) 



For the ion channel example we have been developing, a completely new set of constraints must be devel- 
oped for transitions between states. For the forward problem, we are given X, as well as some set of feasible 
transitions, Y\Xj, from state i. Because the probability of inactivated states are negligible, we consider only 
the open channel state, and single -jump transitions as shown in Fig. 2 of Ref. O. Five transitions from each 
state are possible, corresponding to doing nothing, or all sites moving up or down by the addition of a K + or 
a water at the appropriate end. 

In order to produce a system that conserves energy, we place a constraint on the energy change at each 
step. 

e -P(E(X i+l )-E(Xi)) 

ny\^m= I ^ m ^m) (45) 

This amounts to a stochastic addition of energy to the system in the amount of (dE\Xj) = The 
steady-state distribution will differ from the canonical distribution in general because the normalization con- 
stant, Z[X,/3'I2], depends on X, . This difference has come about because of the addition of information limiting 



20 



which transitions are possible. If all states were available during each transition, the normalization constant 
would again be independent of X, and we would recover the canonical distribution. For the Langevin and 
Brownian equations with uniform applied temperature, the canonical distribution is also obtained because the 
normalization constant is independent of Xj. 

Because transitions are not generally spontaneous, but may have an energy barrier, we add another con- 
straint, fi'E^, directly on the number of transitions per time-step, T, 



These barriers could, of course, be made to depend arbitrarily on the transition, Y, but for simplicity we 
assume that they are present only when a transition occurs and uniformly equal to the sum of 2 ps kcal/mol. 
The stochastic process specified by these two formulas has the identity matrix as the small time-step limit, and 
an equilibrium-like distribution as the large step limit. The energy barrier assumption differs from the usual 
rate equation formulation, since the Chapman-Komologrov equation no longer holds. Instead, the behavior of 
the above system is dependent on the time-scale studied, reminiscent of fractal kinetic models. [42 1 Note also 
that E ' may be a function of the time-step, T, to give a specified average number of transitions to recover a 
Markov model. Because this is a novel kinetic model, it remains to be seen how well these two constraints 
reproduce actual dynamics; however the form of this equation matches well the nonlinearity near t = in 
exact transition probabilities computed for the Miiller-Brown potential surface (Fig. 4 of Ref. H70I ). while 
variations in the surface chosen to divide states can be mimicked by changes in E '. 

To finish our specification of non-equilibrium jump processes, we specify the forces on spontaneous ion 
creation and annihilation. Removing the possibility of a change in ion number unless it either enters or exits 
through an end of the channel, we can then specify the external force, jx, acting on these special events using 
the same type of energy constraint (and assuming for simplicity the same energy barrier) as above. This leads 
to 



with dN mt and dN ext representing the number of ions added to the system (±1) from the internal and external 
solutions, respectively. The form of this transition probability is similar to that of a recent paper on currents 
in boundary driven Kawasaki dynamics, [4] which were also analyzed using a cumulant-generating function 
similar to Eq. |42] 

An outward-driving voltage can be added to the system by imposing an external field, increasing the 
likelihood for transitions moving ions outward by an amount eP AVg ( Y \ The function g(Y) = <S— Xj + \) 

counts the number of ions taking a step outward during transition Y , consistent with the sign convention of 
Fig. [2] For ion movements internal to the channel, this has an equivalent effect on the path distribution as 
applying an energy constraint (/(.) is the indicator function). These constraints provide a complete 

kinetic model for our ion channel in arbitrary solution conditions and driving voltages. 

The steady-state ion occupancies at zero applied voltage and fx identical to that for (e) and (f) of Fig. [3] 
are plotted in panel (g). The steady-state distribution is slightly altered from the local equilibrium prediction 
of (e). This happens despite the fact that the transition probability obeys detailed balance with respect to the 
steady-state, and exactly five transitions lead into each ion occupancy state. The reason is that the transition 
probability is normalized by a different value for the forward and reverse transitions. 

As a final note, the current can be calculated as a perturbation from a steady-state using Eq. 04] 
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(48) 
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This gives the time-dependent linear response for small changes in the holding potential. The conductance 
near the resting potential is the time-integral of the steady-state current auto-correlation function (at zero 
average current), in accordance with Onsager's phenomenological equation! 53 ]. The negative sign comes 
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about because of the positive sign of the constraint (fiAV). At other voltages, this integral is the slope of the 
current/voltage curve. The presence of an additive constant explains why Onsager reciprocity only holds near 
equilibrium, where the fluxes are zero. Other Legendre transforms of Eq. [42] lead to relationships at fixed 
current, etc. as in the usual theory. 
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Fig. 5 Current-voltage plot calculated using the free energies from Fig. 2 of Ref. 1 1| along with the assumptions listed in the 
text. The voltage plotted in this figure is the sum of the five voltage steps between S0-S5. The integrated autocorrelation function 
is shown using tangent lines according to the FDT (Eq. [48}. Although the reversal potential shifts are physically reasonable, 
inward rectification (opposite known channel behavior) is observed. 

The current-voltage characteristics calculated for this system are shown in Fig. [5] The fluctuation-dissipation 
theorem (Eq.l48b gives the slope of the current-voltage curve, and is plotted as a tangent line at each data point. 
Noticeable deviations occur at positive voltages due to numerical error in calculating the steady-state flux and 
long-timescale behavior of the autocorrelation function. This has been traced to very long relaxation times 
(O(10 5 ) steps) for the current, which is in turn due to the low transition probabilities between conduction states 
with high free energy barriers. The set of energy barriers used leads to larger current magnitudes at hyperpo- 
larized voltages (inward-rectifying behavior), inconsistent with the known operation of the channel. It is of 
interest to more accurately model the transition energy barriers and determine whether the time-dependence 
of dwell times for individual states is adequately represented by equations of the present, maximum entropy, 
form. 

5 Conclusions 

This work has attempted to formulate the purely statistical content of statistical mechanics in terms of the 
Bayesian probability theory of Jaynes[33 1. From this perspective, thermodynamics is a tool for understanding 
experimental information and its consequences. In the process, it has become clear that the principles and 
mathematical methods are of much more general applicability than conventional arguments would lead one 
to suppose[25 1 and that a large number of advanced concepts and methods can be synthesized in this way. 

Entropy has been defined from the perspective of information theory, representing the (negative) infor- 
mation content of distributions. Because the entropy is maximized upon adding average value information, 
its first derivative with respect to variations in the distribution is zero. The first law of thermodynamics ex- 
pressed in Eq. [28] is a direct consequence of this observation. We have shown that the process of adding 
average value information while maximizing the relative information entropy at each step is transitive. There- 
fore, adding a series of such constraints in any order will lead to the same distribution, with the sum of the 
information increments adding to the same value for all paths. Because the entropy was defined only relative 
to a reference distribution, the information increments are zero whenever the distribution is unchanged by 
maximizing entropy. Had this relative form been used to define the thermodynamic entropy, the zeroth law of 
thermodynamics would not require special treatment of nuclear spin multiplicity at zero Kelvin. 

Thermostatic partition functions, Z[ASJ, have likewise been identified as expressing relative probabilities. 
Changes in this function correspond to changes in information, and can be understood as a subjective prob- 
ability assignment determining relative likelihoods between allowed alternative states of the system. Before 
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specifying a set of alternate constraints (Q) the system may choose between to reach statistical equilibrium, 
the partition function can only take this relative form, as in (F — > G) of Fig. [T] Once a complete set of con- 
straints is specified, then the partition function decides the relative probability of each state within Q , and it 
is possible to say (Eq.l34b that the probability of state A, divided by the probability of Q, is the probability 
of A given that A is in the set Q. This interpretation of the partition function leads naturally to multicanonical 
ensemble and umbrella sampling methods |9). 

Comparisons between states of knowledge can be done using these functions, and the picture presented 
here does not require the specification of a complete set of all possible states of knowledge. Instead, the 
relations of Sec. [3] give a basic, consistent set of equations for defining the changes between these states. 
This set already justifies the appearance of (in)distinguishability factors in the partition function, as shown in 
ij l4.ll We have provided a justification for the common indicator function, wq (^) (115b . for comparing purely 
entropic changes in phase space, as well as the Boltzmann factor (125 b . for comparing changes in maximum 
entropy information P (T 7 /P(G|S A -) = Z[F§ X ]/Z[GE X }. We have also shown two more advanced examples, 
generating the multicanonical ensemble in £j |4.3l and a conditional maximum entropy in § 14.41 These are related 
to the first two examples as marginal distributions are related to conditional ones. 

The concept of building up thermodynamic equations of state by adding system information is important 
for developing multi-scale understanding of large physical systems. Because this approach is based on using 
well-defined system states at each step, the predictions of the coarse-grained theory may be compared with a 
fully atomistic (or ab-initio electronic) molecular dynamics simulation or coarse-grained Monte-Carlo sam- 
pling. At such levels, the number of states will be greatly increased to include coordinates and momenta of 
all particles, with a change in the energy function to a more accurate approximation. Because this level of de- 
scription quickly becomes computationally intractable, the approximate potential of mean force derived from 
high-level considerations may be useful for locating important states for detailed study, deriving stochastic 
boundary conditions, and applying force or energy biasing sampling techniques. 

As is now well known, the statistical machinery outlined here is generally applicable to problems where 
there is uncertainty. It can be used equally well in reasoning about equilibrium and coarse-graining problems 
as well as non-equilibrium processes. Starting with a 'trajectory space' and adding information on allowed 
transitions as well as expectation values of fluxes between states leads to a state of knowledge about the pro- 
cess. In such a process, the ability to directly write down the equilibrium distribution (a long-sought goal lPJTl 
511) disappears in the same way a marginal distribution over coarse-grained variables cannot be directly pro- 
duced from an equilibrium distribution over all atomistic coordinates and momenta. Instead, the transition 
distribution can be directly written, and the transient fluxes and eventual steady-state (if it exists) become 
path averages. A consideration of the information loss for stochastic processes leads to a formula similar to 
the second law of thermodynamics (139b . applicable arbitrarily far from equilibrium. The information entropy 
functional of the path probability given in Sec.[3]takes on the definition Jaynes' 'caliber,' [28] while its Leg- 
endre transform (142b is a path free energy functional whose Gibbs relations easily generate Green-Kubo type 
fluctuation-dissipation theorems . [ 27 , 28 47"j[l4l We emphasize that these formulas are not required to be ex- 
tensive or local,[35 36 39] avoid the necessity of defining a steady-state,[12 66] and are independent of how 
we define fluxes so that we do not have to immediately write down hydrodynamic equations. 1441 The present 
work has given a necessary statistical foundation for extending these results by carrying over modern equi- 
librium techniques such as the evaluation of free energy differences 1631 . and coordinate/path re-weighting 
techniques [69 49 1 . These formulas achieve Jaynes' goal of providing a "foundation for the predictive aspect 
of statistical mechanics, in which a single basic principle and method applies to all cases, equilibrium or 
otherwise." [26 1 They imbue non-equilibrium and transient dynamic problems with the same structure as the 
equilibrium thermodynamics given by Gibbs II 171 . and open the door for a new understanding of processes far 
from equilibrium. 
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A Formal Derivation of Ratios for Undefined Quantities of Probability 

There are several axiomatic foundations for probability theory, and the system of Komologrov is perhaps the most widely 
taught and well-known. This system begins by assuming a space of elementary states, = {Fi,l2,...}. We assume that Yj 
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are mutually exclusive and exhaustive. In this case, a probability distribution function can be defined as a measure of a set 
P(X|S V ) = Ey,. e x p (^l s v). with P(§vl§v) = 1 and p (^|Sy) > OVY, e § v . Any possible subset of §„ then defines an aggregate 
'state.' Because they are mutually exclusive, separate elementary hypotheses cannot be combined using the 'and' operation, only 
states, such that the probability of state one and state two is the probability of their intersection, X\ 0X2. We have not yet made 
clear how this structure is related to logical inference. 

Formal logic is concerned with proving logical statements from given assumptions. Both the assumptions and the statements 
to be proven can be stated in the form of logical sentences. Each sentence makes assertions about elementary hypotheses using 
some combination of the logical operators. For this paper, we assume the Boolean algebra, including the 'or' operation (+), as 
in X{ +X2 = 'X\ or X£ , the 'and' operation (*), as in X1X2 = X\ and Xi, and the negation, X\ = notXi. Logical statements 
can be assigned a probability by defining states, X, as hypotheses of the form 'the system is in state X.' Each logical sentence 
then maps to a set of states by replacing union for 'or', intersection for 'and', and set complementation for 'not.' Assuming 
all statements are either true or false, aggregate operations may be defined from these, for example the mutually exclusive 
statement A © B = AB + AB and the implication A => B = A + B. The probability of a given statement can then be determined 
from the probability of the set it implies. In order to make logical inferences from an assumed logical sentence, F, we require 
the definition of a conditional probability. This is found by changing the space S y to §r via setting P(y;|FS v ) = OVY} §r and 
then re-normalizing, resulting in 

P(xra = P ^'y. (49) 

Although very easy to present, this system is unnecessarily restrictive for two reasons. The first is that it requires the 
definition of a complete space, § V) at the outset, which no means of reasoning can remove to add new hypotheses. As we have 
seen, inference can then only take place by successively reducing this space to smaller regions. By analogy, the process of 
statistical mechanics would therefore have to begin by assuming a multicanonical ensemble along with its mutually exclusive 
and exhaustive coordinates, and then derive successively constrained systems. Although a valid derivation can be produced this 
way, it appears to deny us the ability to define an isolated physical system. The second reason is related to this point. Physically, 
we would like to begin with the idea of an isolated system and then successively build in more complexity as relevant dynamic 
variables are discovered. The Komologrov system does not provide a means of reasoning about a hypothesis without first defining 
its 'space.' Instead of adding prior information to the right of the conditional sign, we would like to build up a complete picture 
of a physical system by successively moving prior information over to the left. 

This point was considered by Jaynes|33|, who showed that a probability theory 'without bounds' could be derived from 
three desiderata for assigning plausibilities to logical statements. The first was that degrees of plausibility be represented by 
positive, real numbers. The second and third require that all available prior information is used and that equivalent states of 
knowledge and reasoning processes lead to identical results. From these desiderata, the product rule (Bayes' theorem, P (AB\C) = 
P (A\BC) P (B\C) = P (B\AC) P (A\C)), may be deduced. Here, the only requirement is that A,B, and C represent information and 
that the postulate, C, does contradict itself. It is therefore unnecessary to define a space in which A must exist in order to 
determine its plausibility from C. 

One point is worth noting. A logical sentence of the form F = A (B + C) (A © D) immediately implies A as well as denies Z), 
and provides some information about the statements B and C. However, it does not contain any information whatsoever about an 
unrelated proposition, F. With some thought, it can be seen that assignment of plausibilities based on a logical sentence must fall 
into one of four classes: true (P (A|F) = 1), possible (P(B|F)), undecidable (P(F|F)), or impossible (P(D|F) = 0). Probability 
theory is chiefly concerned with propositions that are possible. However, the product rule also applies to situations in which a 
proposition is undecidable. 

What has been said serves to illustrate the difficulty of reasoning without assuming a set of mutually exclusive and exhaustive 
alternatives. To demonstrate this concretely, we will attempt to assign probabilities to a general logical statement, 1/, assuming 
only the principle of indifference, I, and possibly another logical statement, F. Obviously, the plausibility will be one if F => iff 
and zero if Fy/ constitutes a contradiction. The other situations are shown in the following example. 

Consider the meeting of two gamblers who have, by means unspecified, come into possession of a Stern-Gerlach magnet. 
Being as they are, they decide to place wagers on measurements of a beta decay process. In order to decide the winner, 
both agree on the same method of classifying the measurement outcome. The most apparent measurement would be whether or 
not the following event occurred. 

A: An electron is observed in the time interval t,t + dt. 

However, they find themselves unable to assign P (A\I) because of a large amount of uncertainty on the physics of the experiment. 
Then one of them notices that the device can tell them not only if an electron has been observed, but also if it has positive or 
negative spin. This changes their prior information for the problem, since they recognize that there are now two elementary 
hypotheses: A, observed with spin-up, or B, observed with spin-down. They are believed to be mutually exclusive, so that they 
know (AffiB). According to the principle of indifference, 

P(A|(AffiB)/) = i. (50) 

As their previous state of knowledge was unable to distinguish between these two events, it implicitly combined both of 
these two elementary hypotheses into a single event, which they held to be un-assignable. However, it seems that the principle 
of indifference should have some bearing on the question of P (A\I), since in Aristotelian logic, A must always be either true or 
false. Representing this two- valued foundation of Aristotelian logic as L, Cox has derived the sum rule, P (A\LI) +P (A|L/) = 1. 
In this case, assuming L is equivalent to assuming A and A are mutually exclusive events and that one must occur, a situation 
represented by 1 B0V In the case of Aristotelian logic, then, any reasoning on a proposition, A, on the left-side of (A\J), must be 
preceded by assuming A and A are mutually exclusive and exhaustive on the right side. An inability to assign P (A|/) based only 
on / would then amount to some system of logic that does not begin by assuming L. This logical complication in part explains 
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why the problem has not yet been directly discussed, as it requires us to reason about statements which are usually considered 
axiomatic using Aristotelian logic, L. 

Is such a system possible, and if so, does it serve any useful purpose? Jaynes considered this relaxation to be required for 
reasoning about more vaguely defined propositions such as whether a defendant did or did not exercise reasonable judgement in 
a medical malpractice suit. A corollary of the present question is the construction of a non-Euclidian geometry, which is indeed 
possible when one does away with the assumption that through one point, only one parallel may be drawn to a given straight 
line !551 . We argue by analogy that the product rule is more fundamental than the sum rule, and that the most important use of 
removing the rule {A® A) is to make explicit the assumptions on how logical propositions must inter-relate. For example, if A 
represents the proposition that a defendant exercised reasonable judgment, both A and A may be held to be absolutely true, but 
for different choices made by the defendant. In order to make definite conclusions, however, it will be necessary to define a set of 
mutually exclusive hypotheses - for example by enumerating individual actions and measurable ethical standards. Once a set of 
mutually exclusive hypotheses is defined, a problem of deciding plausibilities in the absence of this assumption may be reduced 
to one in Aristotelian form. 

In addition to assuming the product rule, it will be necessary to define a set of operations in a reduced Boolean algebra where 
the plausibility of AA may be nonzero. The contradiction in this statement disappears when A is defined to be a new proposition, 
say B, independent of A unless some prior information is present relating the two. It would seem that by thus removing the 
operation of negation, a reduction to Aristotelian logic is always possible. More precisely, the proposed system of logic, §L 
contains the conjunction and disjunction in the usual sense, but not negation. In order to equate A and B in the Aristotelian 
sense, we must then know that A and B are mutually exclusive and that one or the other is always true. Because of this property, 
statements in §L cannot be disproven unless some relations between them are first assumed. 

We thus add the further relations, '©' to mean that two propositions are mutually exclusive and exhaustive, '=>' to mean 
that the left proposition is logically equivalent to the conjunction (i.e. A (A => B) •<=> AB(A => B)) and ' -<=>■ ' to mean that two 
propositions are logically identical. The Aristotelian expansions, A B = AB+AB and A (BB = AB + AB may not hold in 
general, and in their place, => and <=> define the set of substitution rules which may be used. The principle of contradiction 
is thus P(AB|AffiB) = 0. No contradiction can be deduced without the mutual exclusivity clause, thus P(A|A <^=> A((j)L)) is 
undefined, whereas P (A | (A A)(A+A)) = 1 . It is also evident that A <=> A is always to be assumed. From the product 
rule, P(A|C) = P(AA|C) = P(A|AC)P(A|C), so that the syllogism is likewise reduced to P(A|AC) = 1, irrespective of whether 
or not C contains A0 At this point, all that can be said of the disjunction is that A + A is equivalent to A, A =>• A + B and that 
P(A + B|C) >P(A|C). 

Note that §L is consistent, since any sentence, ly, in (j>L can be converted to one, iff', in L by symbolically re-labeling 
elementary propositions such that \jf is true if and only if is true and 1/ is reducible to a contradiction if and only if iff' is so. 
The construction of l/ may be accomplished simply by replacing all negated elementary propositions (only individual literals 
may be negated in (j>L) with new elementary propositions. The rules of Aristotelian logic for this sentence are in one to one 
correspondence with those of §L. Note that distributing any negations for an expression in L and adding to this expression an 
Aristotelian clause, (A © A), for each elementary proposition that appears constitutes the reverse transformation. 

We now show that it is admissible to use the product rule to completely expand Eq.|50]for comparison to P (A\I). 

1 , ,, N N P(A(A©B)|/) P(A|/) 
= P(A|(AffiB)/) = ■ v v ' - v 1 ' 



2 lv 11 P(AffiB|/) P(A©B|/) 

By defining a set of possible assignments, assuming A ©B reduces statements about A and/or B to Aristotelian form. From this 
example it is evident that the only information required to assign a probability using the principle of indifference is the number 
of elementary hypotheses which may be measured. For A (A ©B), there is only one hypothesis, and it is formally undecidable. 
On the contrary, A © B implies that there are two possibilities, since there are two elementary hypotheses A or B making this 
expression true. Therefore, we define the principle of indifference as one basing its determination of plausibility completely on 
the number of distinct truth assignments which may confirm a logical expression in tj>L. In the case of only one, undecidable 
assignment (represented as <p), the principle of indifference gives an unknown constant. 

P (A|/) const. = P(<p|/) (51) 

According to the product rule, principle of indifference must therefore assign a likelihood to compound propositions as P (A © B\I) - 
2P(<P[/). 

In general a logical sentence, F, represents some assumptions on certain, contradictory, possible, and identical hypotheses. 
Writing the set of literals contained by t as x\ ,X2, ■ ■ ■ ,x„, a basic set of hypotheses, S v , consists of the 2" — 1 conjunctions from 
all possible (non-null) combinations of the x t . However, only some subset, CI, of §^ will be possible given r. We may use the 
principle of indifference to assign likelihoods over this space as 

P(v|r/) = p^, vei2c§ y 

Next, we rigorously define CI as the set of all conjunctions of x,- (i.e. 1/ e §y) that raise the status of F to certainty (1/ =>• r so 
that P(r\\frl) = 1). Other iff' with either be undecidable, with no relevance to T, or contradictory (P {r\x/I) = P(y/|f7) = 0); 
neither will contribute to Eq. [5] Since each literal is represented as a unique xj, we may visualize the set of conjunctions in the 



2 However, it does not make sense to admit logically contradictory prior information such as AB(A(BB). 
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usual sense of a truth table, where false is taken to mean 'not present.' The product rule is now sufficient to show that 

P(v|r/) = ^j, vei2(r) 

= P(I>|7) = P(r|ytf)P(y|7) 

p(r|/) P(r\i) 
p{r\i) 

=^P(r|7) = |fl|P(9|7). 
We note that £1 often takes the form of a product space, e.g. for 

r = (®(A,B,C, .. .))(©(A',B',C', ...))(•••). In this expression, we have defined ©(•) as expanding to a conjunction of exclusive- 
ors on all pairwise combinations in •, so that only one expression from the argument set may be true at once. When £1 is explicitly 
present as an assumption, then we may define each element of £1 as an elementary state to give the Komologrov system of 
probability, in which the sum rule, 

P(fl'|fl7) = £ P( V \£2I) = £ Q' C £2, (52) 

becomes valid. However, it should be noted that the set £1 is not itself elementary, but instead constructed from elementary 
hypotheses of the form 'xj is true,' etc. If the prior information, F, does not state that x\ and X2 are mutually exclusive, for 
example if T =x\ + X2 = {x\ +X2)(x\ +X2) = x\ +X2 +x\X2, then £1 = {x\,X2,x\X2}. In the present paper, we use £1 instead of 
r, since we have not proved that the reverse mapping £1 :— > r is unique. 
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